Stationary optimal process in discounted dynamic programming
نویسندگان
چکیده
منابع مشابه
Constrained Discounted Dynamic Programming
This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semi-continuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Suppose a fe...
متن کاملDetermining Optimal Stationary Strategies for Discounted Stochastic Optimal Control Problem on Networks
The stochastic version of discrete optimal control problem with infinite time horizon and discounted integral-time cost criterion is considered. This problem is formulated and studied on certain networks. A polynomial time algorithm for determining the optimal stationary strategies for the considered problems is proposed and some applications of the algorithm for related Markov decision problem...
متن کاملSmooth Value and Policy Functions for Discounted Dynamic Programming
We consider a discounted dynamic program in which the spaces of states and actions are smooth (in a sense that is suitable for the problem at hand) manifolds. We give conditions that insure that the optimal policy and the value function are smooth functions of the state when the discount factor is small. In addition, these functions vary in a Lipschitz manner as the reward function-discount fac...
متن کاملDiscounted Optimal Stopping Problems for the Maximum Process
The maximality principle [6] is shown to be valid in some examples of discounted optimal stopping problems for the maximum process. In each of these examples explicit formulas for the value functions are derived and the optimal stopping times are displayed. In particular, in the framework of the Black-Scholes model, the fair prices of two lookback options with infinite horizon are calculated. T...
متن کاملGaussian process dynamic programming
Reinforcement learning (RL) and optimal control of systems with continuous states and actions require approximation techniques in most interesting cases. In this article, we introduce Gaussian process dynamic programming (GPDP), an approximate value-function based RL algorithm. We consider both a classic optimal control problem, where problem-specific prior knowledge is available, and a classic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applicationes Mathematicae
سال: 1977
ISSN: 1233-7234,1730-6280
DOI: 10.4064/am-15-4-475-487